Bayesian Linear Regression

Analysis of Flight Delay Data

Sara Parrish, Heather Anderson (Advisor: Dr. Seals)

Nov 20, 2024

Introduction

Objectives

  • Introduce Bayesian Linear Regression (BLR): Understand its principles and how it differs from traditional methods.

  • Explain Bayesian Concepts: Highlight Bayes’ Theorem, prior knowledge, and posterior distributions.

  • Discuss Practical Applications: Show how BLR is applied in analyzing real-world data, like airline delays.

  • Explore Advantages of Bayesian Methods: Quantifying uncertainty, improving predictions, and handling complex data.

  • Present Analysis Findings: Summarize key insights from our BLR model on time and weather-related airline delays.

What is Bayesian Linear Regression?

  • BLR: A statistical approach combining prior knowledge and new data.

  • Goal: Model relationships, make predictions, and handle uncertainty in estimates.

  • Difference from Traditional Methods: Probability-based estimates instead of fixed values.

Introduction to Bayesian Linear Regression

  • Regression under the frequentist framework
    • Independent variables are used to predict dependent variables
    • Linear regression finds best-fitting line to observed data to make further predictions
      • Regression parameters (\beta) are assumed to be fixed
    • Only collected data is used for approximation
  • Regression under the Bayesian framework
    • Independent variables are used to predict dependent variables
    • Regression parameters (\beta) are not assumed to be fixed
    • Collected data is used alongside prior knowledge for approximation

Why Bayesian?

Advantages of Bayesian Linear Regression[1]

  • Incorporation of Prior Knowledge

  • Uncertainty Quantification

  • Expanded Hypotheses

  • Automatic Meta-Analyses

  • Improved Handling of Small Samples

  • Complex Model Estimation

Steps in Bayesian Linear Regression

  1. Model Specification: Define the linear relationship between the dependent and independent variables.

  2. Choose Priors: Select prior distributions for the model parameters, reflecting any existing knowledge about their values.

  3. Data Collection: Gather relevant data for the variables in the model.

  4. Model Fitting: Use computational methods, such as Markov Chain Monte Carlo (MCMC), to estimate the posterior distributions of the parameters based on the observed data.

  5. Result Interpretation: Analyze the posterior distributions to understand the relationships between variables, including estimating means and credible intervals.

Methods

Frequentist vs. Bayesian Approach

  • The Frequentist Approach
    • Typical linear model

Y = \beta_0 + \beta_1X + \varepsilon

  • Y : Dependent variable, the outcome
  • \beta_0 : y intercept
  • \beta_1 : The regression coefficient
  • X : Independent variable
  • \varepsilon : Random error [2]
  • \hat\beta provides a point estimate for the regression coefficient in the frequentist framework

Frequentist vs. Bayesian Approach

  • The Bayesian Approach
    • A regression is constructed using probability distributions, not point estimates as in the frequentist approach
    • Bayes Rule is used to inform the model [2]

p(B|A) = \frac{p(A|B)\cdot p(B)}{p(A)}

  • Bayes Rule allows for the calculation of inverse probability (p(B|A) \text{ from } p(A|B))
  • p(B|A) \text{ and } p(A|B) are conditional probabilities
  • p(A) \text{ and } p(B) are marginal probabilities [3]

The Bayesian Approach

Bayesian Inference can be written simply [4]

Posterior = \frac{Likelihood \times Prior}{Normalization}

  • The Prior is model of prior knowledge on the subject
  • The Likelihood is the probability of the data given the prior
  • The Normalization is a constant that ensures the posterior distribution is a valid density function whose integration is equal to 1
  • The Posterior is the probability model that expresses an updated view of the model parameters
    • From the initial parameters of the prior
    • Updated with new data expressed in the likelihood function

The Bayesian Approach

  • A more formal expression of Bayes Rule applied for continuous parameters

\begin{align*} p(\theta|y) =& \frac{ L(\theta|y)p(\theta) }{p(y)}\\ \\ p(\theta|y) \propto & \text{ }L(\theta|y)p(\theta) \end{align*}

  • The normalization constant (p(y) above) ensures the posterior distribution is a valid distribution
    • The posterior density function can be written without this constant
  • The resulting prediction is not a point estimate, but a distribution [5]

The Bayesian Approach

  • The Bayesian Linear Regression Model
    • Changes based on:
      • distribution chosen for regression
      • distribution and hyperparameters chosen for priors
    • Our test cases models a continuous outcome so we use a Normal model with conjugate normal priors

Role of Prior Knowledge in Shaping Predictions

  • Priors can be subjective or objective
    • objective is preferred
  • Uninformative priors can be used when there is not adequate prior knowledge
  • Discounted priors are the result of adjusting a known prior to better reflect the current data[3]

Figure from [3]

Understanding the Bayesian Framework

  • Bayes’ theorem is used to update prior beliefs about model parameters with new data
    • This results in a posterior distribution [5]
  • Posterior distribution vs. point estimates
    • measures the uncertainty in predictions
    • richer picture for predictions
    • better uncertainty quantification [4]

Interpreting the Posterior with Markov Chain Monte Carlo (MCMC)

What is MCMC?

  • A computational method to approximate posterior distributions.
  • Samples parameter values using prior and observed data.

How Does It Work?

  • Generates a sequence of dependent samples (Markov Chain).
  • Approximates posterior features like means and credible intervals.
  • Multiple chains and diagnostics ensure reliable results.

Figure from [6]

Interpreting the Posterior with Markov Chain Monte Carlo (MCMC)

  • Importance:
    • Estimates complex posteriors.
    • Quantifies uncertainty with credible intervals.
    • Simulates predictions for future outcomes.
  • Some popular MCMC algorithms:
    • Gibbs sampler
    • Metropolis-Hastings [3]
    • Hamiltonian [7]

Sara’s Prior Selection & Model Specification

Model 1: Continuous Predictor

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \end{align*} Where:

  • Y_i is the arrival delay for the i-th flight
  • X_i is the departure delay for the i-th flight
  • \mu_i = \beta_0 + \beta_1X_i is the local mean arrival delay, specific to the departure time
  • \sigma^2 is the variance of the errors
  • \overset{\text{ind}}{\sim} indicates conditional independence of each arrival delay with the given parameters

Model 1: Continuous Predictor

The model can be written as

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(m_0, s_0^2)\\ \beta_1 &\sim N(m_1, s_1^2)\\ \sigma &\sim \text{Exp}(l) \end{align*}

  • Regression parameters

    • Intercept: \beta_0 \sim N(m_0, s^2_0)
    • Slope: \beta_1 \sim N(m_1, s^2_1)
    • Error: \sigma \sim \text{Exp}(l)

Model 2: Categorical Predictor

\begin{align*} Y_i|\beta_0, \beta_1, ... \beta_6, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... \beta_6X_{i6} \\ \beta_{0} &\sim N(m_0, s_0^2)\\ \beta_1 &\sim N(m_1, s_1^2)\\ \sigma &\sim \text{Exp}(l) \end{align*}

Where

  • X_{i1}, X_{i2}, ..., X_{i6} are indicator variables for the day of the week.

Tuning Hyperparameters

  • Mean, \mu = 2.12, and SD, \sigma = 36.42, of arrival delays were taken for departures within 10 minutes of the mean departure time.

\beta_{0c} \sim N(2, 36^2)

  • The coefficient, 0.019, and SE of the OLS regression were used.

\beta_{1} \sim N(0.02, 0.01^2)

  • The expected standard deviation was set equal to the residual standard error of the regression, i.e. E(\sigma) = \frac{1}{l} \approx 51.1

\sigma \sim \text{Exp}(0.02)

The Updated Models

Model 1

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(2, 36^2)\\ \beta_1 &\sim N(0.02, 0.01^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

Model 2

\begin{align*} Y_i|\beta_0, \beta_1, ... \beta_6, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... \beta_6X_{i6} \\ \beta_{0} &\sim N(2, 46^2)\\ \beta_j &\sim N(0, 50^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

Statistical Programming

  • Data was analyzed with R [8] in RStudio [9], imported via CSV.
  • Libraries
    • rstanarm [11]
      • stan_glm() - simulation of model
      • posterior_predict() - simulation of posterior
    • bayesrules [12]
      • prediction_summary_cv() - evaluation of posterior
    • bayesplot [13]
      • Effective sample size (neff()) and R-hat (rhat())

Heather’s Prior Selection & Model Specification

Prior Selection

  • Intercept (\beta_0): \beta_0 \sim N(0, 5^2) Assumes no strong baseline effect.

  • Slope (\beta_1): \beta_1 \sim N(0, 5^2) Reflects no strong prior belief about the relationship between weather incidents and delays.

  • Error Term (\sigma): \sigma \sim \text{Exp}(1) Accounts for variability in delays; allows flexibility.

Model Specification

Y_i \mid \beta_0, \beta_1, \sigma \sim N(\mu_i, \sigma^2) \mu_i = \beta_0 + \beta_1 X_i

  • Y_i: Arrival delay (minutes)
  • X_i: Weather-related incidents

Sara’s Analysis

The Dataset

Header Description
Fl Date Flight Date (yyyy-mm-dd)
Airline Airline Name
Airline DOT Airline Name and Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years.
Airline Code Unique Carrier Code
DOT Code An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation.
Fl Number Flight Number
Origin Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused.
Origin City Origin City Name, State Code
Dest Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused.
Dest City Destination City Name, State Code
CRS Dep Time CRS Departure Time (local time: hhmm)
Dep Time Actual Departure Time (local time: hhmm)
Dep Delay Difference in minutes between scheduled and actual departure time. Early departures show negative numbers.
Taxi Out Taxi Out Time, in Minutes
Wheels Off Wheels Off Time (local time: hhmm)
Wheels On Wheels On Time (local time: hhmm)
Taxi In Taxi In Time, in Minutes
CRS Arr Time CRS Arrival Time (local time: hhmm)
Arr Time Actual Arrival Time (local time: hhmm)
Arr Delay Difference in minutes between scheduled and actual arrival time. Early arrivals show negative numbers.
Cancelled Cancelled Flight Indicator (1=Yes)
Cancellation Code Specifies The Reason For Cancellation
Diverted Diverted Flight Indicator (1=Yes)
CRS Elapsed Time CRS Elapsed Time of Flight, in Minutes
Actual Elapsed Time Elapsed Time of Flight, in Minutes
Air Time Flight Time, in Minutes
Distance Distance between airports (miles)
Carrier Delay Carrier Delay, in Minutes
Weather Delay Weather Delay, in Minutes
NAS Delay National Air System Delay, in Minutes
Security Delay Security Delay, in Minutes
Late Aircraft Delay Late Aircraft Delay, in Minutes

Table 1

Table 1: Flight Delay Summary by Flight Period
Flight Period
Flight Period
Morning Afternoon Evening Total
TotalFlightsCount 1246031 (41.5%) 1423140 (47.4%) 330829 (11.0%) 3000000 (100%)
CancelledFlightsCount 30690 (38.8%) 38343 (48.4%) 10107 (12.8%) 79140 (100%)
DivertedFlightsCount 2555 (36.2%) 3901 (55.3%) 600 (8.5%) 7056 (100%)
AvgCRSDepTime 08:49:31 15:73:19 20:66:23 13:27:04
AvgDepTime 08:53:58 15:89:05 20:12:40 13:29:47
AvgDepDelay 5.23 12.93 16.51 10.12
AvgTaxiOut 16.87 16.44 16.65 16.64
AvgTaxiIn 7.75 7.78 6.95 7.68
AvgCRSArrTime 10:87:15 17:85:11 17:42:14 14:90:34
AvgArrTime 10:86:01 17:71:56 15:89:47 14:66:31
AvgArrDelay -0.77 7.34 10.04 4.26
AvgAirTime 114.12 109.8 116.31 112.31
CarrierDelayCount 86824 (29.2%) 162266 (54.6%) 47861 (16.1%) 296951 (100%)
SecurityDelayCount 887 (32.1%) 1434 (52.0%) 438 (15.9%) 2759 (100%)
WeatherDelayCount 8380 (26.7%) 18758 (59.7%) 4290 (13.7%) 31428 (100%)
NASDelayCount 80604 (31.4%) 144366 (56.3%) 31507 (12.3%) 256477 (100%)
LateAircraftDelayCount 42721 (16.5%) 168902 (65.2%) 47391 (18.3%) 259014 (100%)
Summary includes morning, afternoon, and evening flight periods.

Data Preprocessing

  • Notable changes:
    • Time format
    • Day of the week variable
    • Removal of cancelled and diverted flights

Data Preprocessing

Modeling

The Normal Data Model: Departure Time Predictor

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(2, 36^2)\\ \beta_1 &\sim N(0.02, 0.01^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

Modeling

The Normal Data Model: Departure Time

Modeling

The Normal Data Model: Week Day Predictor

\begin{align*} Y_i|\beta_0, \beta_1, ... \beta_6, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... \beta_6X_{i6} \\ \beta_{0} &\sim N(2, 46^2)\\ \beta_j &\sim N(0, 50^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

Modeling

The Normal Data Model: Week Day Predictor

Results

Table 2. Estimations of the Posterior Distributions’ Regression Coefficients.
Mean SE 95% CI
Model 1: Continuous Predictor
Flat Priors 𝛽₀ Intercept -10.92 0.47 (-11.85; -10.00)
𝛽₁ Departure Time 0.02 0.00 (0.02; 0.02)
𝜎 51.09 0.12 (50.86; 51.33)
Default Tuned Priors 𝛽₀ Intercept -10.94 0.47 (-11.86; -10.01)
𝛽₁ Departure Time 0.02 0.00 (0.02; 0.02)
𝜎 51.09 0.12 (50.86; 51.32)
Tuned Priors 𝛽₀ Intercept -11.66 0.17 (-12.02; -11.32)
𝛽₁ Departure Time 0.02 0.00 (0.02; 0.02)
𝜎 51.10 0.12 (50.87; 51.33)
Model 2: Categorical Predictor
Flat Priors 𝛽₀ Intercept (Tuesday) 1.57 0.44 (0.69; 2.43)
𝛽₅ Sunday 4.36 0.61 (3.21; 5.56)
𝛽₆ Monday 2.93 0.65 (1.72; 4.21)
𝛽₁ Wednesday 4.92 0.61 (3.72; 6.13)
𝛽₂ Thursday 2.66 0.65 (1.47; 3.88)
𝛽₃ Friday 1.86 0.62 (0.69; 3.05)
𝛽₄ Saturday 3.43 0.61 (2.25; 4.66)
𝜎 51.39 0.12 (51.16; 51.62)
Default Tuned Priors 𝛽₀ Intercept (Tuesday) 1.54 0.40 (0.73; 2.43)
𝛽₅ Sunday 3.48 0.58 (2.31; 4.70)
𝛽₆ Monday 1.89 0.60 (0.74; 3.04)
𝛽₁ Wednesday 4.40 0.60 (3.21; 5.58)
𝛽₂ Thursday 2.69 0.60 (1.48; 3.82)
𝛽₃ Friday 2.97 0.64 (1.75; 4.18)
𝛽₄ Saturday 4.96 0.59 (3.77; 6.13)
𝜎 51.40 0.12 (51.17; 51.62)
Tuned Priors 𝛽₀ Intercept (Tuesday) 1.54 0.44 (0.66; 2.43)
𝛽₅ Sunday 4.41 0.61 (3.23; 5.66)
𝛽₆ Monday 2.99 0.64 (1.73; 4.25)
𝛽₁ Wednesday 4.96 0.64 (3.76; 6.18)
𝛽₂ Thursday 2.70 0.61 (1.48; 3.89)
𝛽₃ Friday 1.91 0.64 (0.71; 3.15)
𝛽₄ Saturday 3.50 0.63 (2.28; 4.73)
𝜎 51.39 0.12 (51.17; 51.62)

Results

Results

Comparison of the Models

Table 3. Posterior predictive results from k-fold cross validation.
MAE MAE Scaled Within 50% Within 95%
     Model 1: Continuous Predictor
Flat Priors 15.730 0.313 0.841 0.966
Default Tuned Priors 15.779 0.314 0.840 0.966
Tuned Priors 15.668 0.312 0.849 0.966
     Model 2: Categorical Predictor
Flat Priors 17.110 0.338 0.866 0.965
Default Tuned Priors 17.080 0.338 0.866 0.966
Tuned Priors 17.118 0.339 0.867 0.966

Effective Sample Size

Table 4. Effective sample size ratios for Model 1.
Priors 𝛽₀ Intercept 𝛽₁ Departure Time 𝜎
Flat 0.83 1.19 0.48
Default 0.80 1.10 0.80
Tuned 0.64 0.71 1.04
Table 5. Effective sample size ratios for Model 2.
Priors 𝛽₀ Intercept (Tuesday) 𝛽₅ Sunday 𝛽₆ Monday 𝛽₁ Wednesday 𝛽₂ Thursday 𝛽₃ Friday 𝛽₃ Friday 𝜎
Flat 0.27 0.35 0.36 0.36 0.38 0.36 0.37 2.92
Default 0.37 0.59 0.55 0.60 0.61 0.56 0.58 0.66
Tuned 0.47 0.97 0.93 0.97 0.89 1.00 0.95 0.21

\hat{R} Analysis

Table 6. R-hat metric for Model 1.
Priors 𝛽₀ Intercept 𝛽₁ Departure Time 𝜎
Flat 0.9997 0.9996 1.0015
Default 1.0008 1.0005 1.0004
Tuned 1.0004 0.9996 1.0004
Table 7. R-hat metric for Model 2.
Priors 𝛽₀ Intercept (Tuesday) 𝛽₅ Sunday 𝛽₆ Monday 𝛽₁ Wednesday 𝛽₂ Thursday 𝛽₃ Friday 𝛽₃ Friday 𝜎
Flat 1.0045 1.0036 1.0024 1.0027 1.0027 1.0028 1.0012 0.9994
Default 1.0015 0.9995 1.0002 1.0001 1.0015 0.9996 1.0010 0.9995
Tuned 1.0003 1.0008 0.9996 0.9995 1.0002 0.9997 0.9998 1.0056

OLS Comparison

Table 8. Model 1 Comparison: Bayesian and OLS
Estimate SE
Model 1: Continuous Predictor
Default Tuned
Priors
𝛽₀ Intercept -10.94 0.47
𝛽₁ Departure Time 0.02 0.00
𝜎 51.09 0.12
OLS Model: Continuous Predictor
𝛽₀ Intercept -10.92 0.47
𝛽₁ Departure Time 0.02 0.00
Residual Standard Error 51.09
Table 9. Model 2 Comparison: OLS and Bayesian
Estimate SE
    Model 2: Categorical Predictor
Default Tuned
Priors
𝛽₀ Intercept (Tuesday) 1.54 0.40
𝛽₅ Sunday 3.48 0.58
𝛽₆ Monday 1.89 0.60
𝛽₁ Wednesday 4.40 0.60
𝛽₂ Thursday 2.69 0.60
𝛽₃ Friday 2.97 0.64
𝛽₄ Saturday 4.96 0.59
𝜎 51.40 0.12
OLS Model: Categorical Predictor
𝛽₀ Intercept (Tuesday) 1.55 0.44
𝛽₅ Sunday 4.95 0.62
𝛽₆ Monday 2.68 0.62
𝛽₁ Wednesday 1.88 0.62
𝛽₂ Thursday 3.47 0.62
𝛽₃ Friday 4.40 0.61
𝛽₄ Saturday 2.96 0.64
Residual Standard Error 51.39

OLS Comparison

Posterior Predictive Checks: Model 1

Posterior Predictive Checks: Model 2

Heather’s Analysis & Results

Meet My Dataset!

Variable Description
year The year of the data.
month The month of the data.
carrier Carrier code.
carrier_name Carrier name.
airport Airport code.
airport_name Airport name.
arr_flights Number of arriving flights.
arr_del15 Flights delayed by 15+ minutes.
carrier_ct Carrier-caused delays.
weather_ct Weather-caused delays.
nas_ct NAS-related delays.
security_ct Security-caused delays.
late_aircraft_ct Delays from late aircraft.
arr_cancelled Number of canceled flights.
arr_diverted Number of diverted flights.
arr_delay Total arrival delay.
carrier_delay Delay attributed to the carrier.
weather_delay Delay attributed to weather.
nas_delay Delay attributed to the NAS.
security_delay Delay attributed to security.
late_aircraft_delay Delay from late-arriving aircraft.

Exploring the Data

Exploring the Data

Choosing Focus

Table 1: Summary of Flight Arrivals, Delays, Cancellations, and Diversions

Table 1: Summary of Flight Arrivals, Delays, Cancellations, and Diversions (August Data)
Characteristic Value
Total Months of Data (August) 1.00
Total Carriers 21.00
Total Arrived Flights (Count Data) 62,146,805.00
Total Delayed Flights (15+ min) 11,375,095.00
- Carrier Delays (31.34%) 3,565,080.59
- Weather Delays (3.39%) 385,767.94
- NAS Delays (29.21%) 3,322,432.52
- Security Delays (0.24%) 26,930.39
- Late Aircraft Delays (35.82%) 4,074,891.00
Total Cancelled Flights 1,290,923.00
Total Diverted Flights 148,007.00
Cancelled Flights (%) 2.08
Diverted Flights (%) 0.24

Code for Model

Trace Plots and Posterior Distributions

Model Parameters and Estimates

Parameter Estimate Standard Error 95% Credible Interval
Intercept -2116.53 7.67 [-2131.41, -2100.91]
Weather Count 1041.97 2.66 [1036.73, 1047.15]
Sigma 8676.19 15.52 [8646.95, 8706.92]

Model Diagnostics and Fit Statistics

Statistic Value
Number of Observations 171,426
Model Family Gaussian
Formula arr_delay ~ weather_ct
Iterations 2000
Warmup 1000
Chains 4
Effective Sample Size (Bulk) [Intercept, Weather Count] [2102.722, 2000.139]
Effective Sample Size (Tail) [Intercept, Weather Count] [2095.692, 1858.849]
Mean Arrival Delay (minutes) 1041.966
Median Arrival Delay (minutes) 1041.971
Standard Deviation of Arrival Delay 2.660956
95% Credible Interval for Mean Arrival Delay [1036.731, 1047.15]

Posterior Distribution for Weather Count Coefficient

Posterior Predictive Check

Conclusions

Sara’s Conclusions

  • Model 1: An increase of 1.2 seconds in arrival delay per minute past midnight

Y = -11.66 + 0.02X.

  • Model 2: Varying predicted arrival delays for the days of the week

Y = 1.54 + 4.96X_1 + 2.70X_2 + 1.91X_3 + 3.50X_4 + 4.41X_5 + 2.99X_6.

  • Residual Variability
    • \sigma \approx 51 for all models

Model Diagnostics

  • K-Fold Cross Validation

    • MAE of 15.7 and 17.1 \text{ minutes}
    • 96.6\% of observations within 95\% prediction intervals
  • Effective Sample Size

    • N_\text{eff} > 0.1
  • \hat{R} \approx 1

  • MCMC Diagnostics

    • Good mixing and convergence
    • Consistency and unimodality
  • Autocorrelation

    • Weak correlation, rapid decay

Heather’s Conclusions

Key Findings

  • Intercept: -2116.53 (95% CI: [-2131.41, -2100.91])

    • Indicates significantly shorter delays without weather incidents.
  • Weather Count Coefficient: 1041.97 (95% CI: [1036.73, 1047.15])

    • A 1-unit increase in weather incidents leads to an average 1042-minute delay.

    • Weather incidents are infrequent but highly disruptive.

  • Uncertainty Measures:

    • Residual variability: Standard deviation = 8676.19.

    • Suggests other unmeasured factors affecting delays.

  • Model Diagnostics:

    • Rhat = 1.00 for all parameters, indicating convergence.

    • Large effective sample sizes ensure reliable posterior estimates.

Conclusion

  • Key Insight:

    • Weather-related incidents, though infrequent, have a disproportionately large impact on delay times.

    • Highlights the need for better weather management and forecasting.

  • Bayesian Approach:

    • Accounts for uncertainty, providing credible intervals for estimates.

    • Supports informed decision-making in airline operations and policy-making.

Discussion and Future Research

  • What other factors could be included in the model?

  • How could expanding the dataset improve insights?

  • What advanced Bayesian methods could be explored?

  • How should outliers be addressed?

  • What assumptions should be revisited?

Thank You! Questions?

References

[1]
M. J. Zyphur and F. L. Oswald, “Bayesian estimation and inference,” J. Manage., vol. 41, no. 2, pp. 390–420, Feb. 2015.
[2]
X. Yan and X. G. Su, Linear regression analysis: Theory and computing. Singapore: World Scientific Publishing, 2009. Available: https://ebookcentral.proquest.com/lib/uwf/reader.action?docID=477274&ppg=318&pq-origsite=primo
[3]
E. Lesaffre and A. B. Lawson, Bayesian biostatistics, 1st ed. Somerset: John Wiley & Sons, Ltd, 2012. doi: https://doi.org/10.1002/9781119942412.
[4]
W. Koehrsen, “Introduction to bayesian linear regression.” https://towardsdatascience.com/introduction-to-bayesian-linear-regression-e66e60791ea7, Apr. 2018.
[5]
T. Bayes, “An essay towards solving a problem in the doctrine of chances. 1763,” 1763.
[6]
S.-S. Jin, H. Ju, and H.-J. Jung, “Adaptive markov chain monte carlo algorithms for bayesian inference: Recent advances and comparative study,” Structure and Infrastructure Engineering, Jun. 2019, doi: 10.1080/15732479.2019.1628077.
[7]
Stan Development Team, RStan: The R interface to Stan.” 2024. Available: https://mc-stan.org/
[8]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2023. Available: https://www.R-project.org/
[9]
Posit team, RStudio: Integrated development environment for r. Boston, MA: Posit Software, PBC, 2024. Available: http://www.posit.co/
[10]
P. Zelazko, “Flight delay and cancellation dataset (2019-2023).” https://www.kaggle.com/datasets/patrickzel/flight-delay-and-cancellation-dataset-2019-2023., Nov. 2023.
[11]
S. Brilleman, M. Crowther, M. Moreno-Betancur, J. Buros Novik, and R. Wolfe, “Joint longitudinal and time-to-event models via Stan.” 2018. Available: https://github.com/stan-dev/stancon_talks/
[12]
M. Dogucu, A. Johnson, and M. Ott, Bayesrules: Datasets and supplemental functions from bayes rules! book. 2021. Available: https://github.com/bayes-rules/bayesrules
[13]
J. Gabry, D. Simpson, A. Vehtari, M. Betancourt, and A. Gelman, “Visualization in bayesian workflow,” J. R. Stat. Soc. A, vol. 182, pp. 389–402, 2019, doi: 10.1111/rssa.12378.

Index

Model 1

Flat Continuous Model
Default Continuous Model
Tuned Continuous Model
\beta_0 informs the model intercept

\beta_{0c} reflects the typical arrival delay at a typical departure time. With a mean departure time at \sim 1:30pm, the average arrival delay is \sim 2 minutes with a standard deviation \sim 36 minutes.

\beta_{0c} \sim N(2, 36^2)

\beta_1 informs the model slope

The slope of the linear model indicates a 0.019 minute increase in arrival delay per minute increase in departure time, so we set m_1 = 0.02. The standard error reflects high confidence at 0.0005, but as to not limit the model we will set it lower at s_1 = 0.01.

\beta_{1} \sim N(0.02, 0.01^2)

\sigma informs the regression standard deviation

To tune the exponential model, we set the expected value of the standard deviation, E(\sigma), equal to the residual standard error, \sim 50. With this, we can find the rate parameter, l.

\begin{align*} E(\sigma) &= \frac{1}{l} = 50\\\\ l &= \frac{1}{50} = 0.02\\\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(2, 36^2)\\ \beta_1 &\sim N(0.02, 0.01^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

Model 2

The Normal Data Model: Week Day Predictor

Flat Categorical Model
Default Categorical Model
Tuned Categorical Model

For arrival delays by the day of the week, the Figure 9 shows mean arrival delays are between 1 and 7 minutes while the median arrival delays are all in the negative, indicating a skew towards larger delays.

\beta_0 informs the model intercept

\beta_{0} reflects the mean arrival delay on Tuesday, our reference. The average arrival delay is \sim 2 minutes with a standard deviation \sim 46 minutes.

\beta_{0} \sim N(2, 46^2)

\beta_j informs the model slopes

For a categorical predictor with the stan_glm() function, the tuned prior, \beta_j, is applied to to the estimation of each coefficient associated with the individual levels of the predictor ($_1, _2, …, _6 $). For this reason, we set the coefficient prior to be weakly informative.

\beta_{j} \sim N(0, 50^2)

\sigma informs the regression standard deviation

To tune the exponential model, we set the expected value of the standard deviation, E(\sigma), equal to the residual standard error which is the same as with the previous model, \sim 50.

\begin{align*} E(\sigma) &= \frac{1}{l} = 50\\\\ l &= \frac{1}{50} = 0.02\\\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

The tuned model is as follows,

\begin{align*} Y_i|\beta_0, \beta_1, ... \beta_6, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... \beta_6X_{i6} \\ \beta_{0} &\sim N(2, 46^2)\\ \beta_j &\sim N(0, 50^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

Prior Summary: Model 1 Default Priors

Priors for model 'default_model_dt' 
------
Intercept (after predictors centered)
  Specified prior:
    ~ normal(location = 4.5, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 4.5, scale = 129)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 0.43)

Auxiliary (sigma)
  Specified prior:
    ~ exponential(rate = 1)
  Adjusted prior:
    ~ exponential(rate = 0.019)
------
See help('prior_summary.stanreg') for more details

Prior Summary: Model 1 Flat Priors

Priors for model 'flat_model_dt' 
------
Intercept (after predictors centered)
 ~ flat

Coefficients
 ~ flat

Auxiliary (sigma)
 ~ flat
------
See help('prior_summary.stanreg') for more details

Prior Summary: Model 2 Default Priors

Priors for model 'default_model_dow' 
------
Intercept (after predictors centered)
  Specified prior:
    ~ normal(location = 4.5, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 4.5, scale = 129)

Coefficients
  Specified prior:
    ~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
  Adjusted prior:
    ~ normal(location = [0,0,0,...], scale = [364.46,361.83,370.01,...])

Auxiliary (sigma)
  Specified prior:
    ~ exponential(rate = 1)
  Adjusted prior:
    ~ exponential(rate = 0.019)
------
See help('prior_summary.stanreg') for more details